282 research outputs found
Measuring Sociality in Driving Interaction
Interacting with other human road users is one of the most challenging tasks
for autonomous vehicles. For congruent driving behaviors, it is essential to
recognize and comprehend sociality, encompassing both implicit social norms and
individualized social preferences of human drivers. To understand and quantify
the complex sociality in driving interactions, we propose a Virtual-Game-based
Interaction Model (VGIM) that is parameterized by a social preference
measurement, Interaction Preference Value (IPV). The IPV is designed to capture
the driver's relative inclination towards individual rewards over group
rewards. A method for identifying IPV from observed driving trajectory is also
developed, with which we assessed human drivers' IPV using driving data
recorded in a typical interactive driving scenario, the unprotected left turn.
Our findings reveal that (1) human drivers exhibit particular social preference
patterns while undertaking specific tasks, such as turning left or proceeding
straight; (2) competitive actions could be strategically conducted by human
drivers in order to coordinate with others. Finally, we discuss the potential
of learning sociality-aware navigation from human demonstrations by
incorporating a rule-based humanlike IPV expressing strategy into VGIM and
optimization-based motion planners. Simulation experiments demonstrate that (1)
IPV identification improves the motion prediction performance in interactive
driving scenarios and (2) the dynamic IPV expressing strategy extracted from
human driving data makes it possible to reproduce humanlike coordination
patterns in the driving interaction
Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning
Scene text recognition has been studied for decades due to its broad
applications. However, despite Chinese characters possessing different
characteristics from Latin characters, such as complex inner structures and
large categories, few methods have been proposed for Chinese Text Recognition
(CTR). Particularly, the characteristic of large categories poses challenges in
dealing with zero-shot and few-shot Chinese characters. In this paper, inspired
by the way humans recognize Chinese texts, we propose a two-stage framework for
CTR. Firstly, we pre-train a CLIP-like model through aligning printed character
images and Ideographic Description Sequences (IDS). This pre-training stage
simulates humans recognizing Chinese characters and obtains the canonical
representation of each character. Subsequently, the learned representations are
employed to supervise the CTR model, such that traditional single-character
recognition can be improved to text-line recognition through image-IDS
matching. To evaluate the effectiveness of the proposed method, we conduct
extensive experiments on both Chinese character recognition (CCR) and CTR. The
experimental results demonstrate that the proposed method performs best in CCR
and outperforms previous methods in most scenarios of the CTR benchmark. It is
worth noting that the proposed method can recognize zero-shot Chinese
characters in text images without fine-tuning, whereas previous methods require
fine-tuning when new classes appear. The code is available at
https://github.com/FudanVI/FudanOCR/tree/main/image-ids-CTR.Comment: ICCV 202
Orientation-Independent Chinese Text Recognition in Scene Images
Scene text recognition (STR) has attracted much attention due to its broad
applications. The previous works pay more attention to dealing with the
recognition of Latin text images with complex backgrounds by introducing
language models or other auxiliary networks. Different from Latin texts, many
vertical Chinese texts exist in natural scenes, which brings difficulties to
current state-of-the-art STR methods. In this paper, we take the first attempt
to extract orientation-independent visual features by disentangling content and
orientation information of text images, thus recognizing both horizontal and
vertical texts robustly in natural scenes. Specifically, we introduce a
Character Image Reconstruction Network (CIRN) to recover corresponding printed
character images with disentangled content and orientation information. We
conduct experiments on a scene dataset for benchmarking Chinese text
recognition, and the results demonstrate that the proposed method can indeed
improve performance through disentangling content and orientation information.
To further validate the effectiveness of our method, we additionally collect a
Vertical Chinese Text Recognition (VCTR) dataset. The experimental results show
that the proposed method achieves 45.63% improvement on VCTR when introducing
CIRN to the baseline model.Comment: IJCAI 202
An Analytical Model for Predicting the Stress Distributions within Single-Lap Adhesively Bonded Beams
An analytical model for predicting the stress distributions within single-lap adhesively bonded beams under tension is presented in this paper. By combining the governing equations of each adherend with the joint kinematics, the overall system of governing equations can be obtained. Both the adherends and the adhesive are assumed to be under plane strain condition. With suitable boundary conditions, the stress distribution of the adhesive in the longitudinal direction is determined
On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender Systems
Reinforcement learning serves as a potent tool for modeling dynamic user
interests within recommender systems, garnering increasing research attention
of late. However, a significant drawback persists: its poor data efficiency,
stemming from its interactive nature. The training of reinforcement
learning-based recommender systems demands expensive online interactions to
amass adequate trajectories, essential for agents to learn user preferences.
This inefficiency renders reinforcement learning-based recommender systems a
formidable undertaking, necessitating the exploration of potential solutions.
Recent strides in offline reinforcement learning present a new perspective.
Offline reinforcement learning empowers agents to glean insights from offline
datasets and deploy learned policies in online settings. Given that recommender
systems possess extensive offline datasets, the framework of offline
reinforcement learning aligns seamlessly. Despite being a burgeoning field,
works centered on recommender systems utilizing offline reinforcement learning
remain limited. This survey aims to introduce and delve into offline
reinforcement learning within recommender systems, offering an inclusive review
of existing literature in this domain. Furthermore, we strive to underscore
prevalent challenges, opportunities, and future pathways, poised to propel
research in this evolving field.Comment: under revie
Human Health Indicator Prediction from Gait Video
Body Mass Index (BMI), age, height and weight are important indicators of
human health conditions, which can provide useful information for plenty of
practical purposes, such as health care, monitoring and re-identification. Most
existing methods of health indicator prediction mainly use front-view body or
face images. These inputs are hard to be obtained in daily life and often lead
to the lack of robustness for the models, considering their strict requirements
on view and pose. In this paper, we propose to employ gait videos to predict
health indicators, which are more prevalent in surveillance and home monitoring
scenarios. However, the study of health indicator prediction from gait videos
using deep learning was hindered due to the small amount of open-sourced data.
To address this issue, we analyse the similarity and relationship between pose
estimation and health indicator prediction tasks, and then propose a paradigm
enabling deep learning for small health indicator datasets by pre-training on
the pose estimation task. Furthermore, to better suit the health indicator
prediction task, we bring forward Global-Local Aware aNd Centrosymmetric
Encoder (GLANCE) module. It first extracts local and global features by
progressive convolutions and then fuses multi-level features by a
centrosymmetric double-path hourglass structure in two different ways.
Experiments demonstrate that the proposed paradigm achieves state-of-the-art
results for predicting health indicators on MoVi, and that the GLANCE module is
also beneficial for pose estimation on 3DPW
- …